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1.  INTRODUCTION 


Statistical  methodology  has  long  been  a  familiar  tool  for 
use  in  understanding  our  natural  environment.  Classical  examples 
of  applications  of  statistics  are  seen  in  weather  forecasting, 
in  evaluation  of  attempts  at  weather  modification  by  cloud  seed¬ 
ing,  and  in  descriptions  of  the  fluctuations  in  the  sea  surface. 

Now  the  accessibility  of  new  and  extensive  data  from  a  variety 
of  remote  sensing  sources,  such  as  earth  orbiting  and  geostationary 
satellites,  again  calls  for  the  development  and  application  of 
appropriate  statistical  methodology.  Classical  methods  of  statis¬ 
tics  and  of  probability  modeling  frequently  must  be  adapted  to 
the  new  needs.  The  process  of  adaptation  will  proceed  most 
efficiently  if  statisticians  work  cooperatively  with  the  scientists 
actually  obtaining  data  and  studying  the  associated  natural 
phenomena.  Conferences  such  as  PRIMARS  I  are  of  great  value  in 
promoting  the  necessary  interchange  of  information  and  the  stimulus 
to  approach  novel  and  difficult  problems  in  a  realistic  manner. 

This  paper  describes  new  approaches  to  the  analysis  of 
data,  in  particular  to  quite  "noisy "  data  of  the  sort  that  is 
likely  to  be  encountered  when  observing  the  natural  environment. 

The  descriptions  given  will  necessarily  be  brief,  but  an  attempt 
will  be  made  to  show  how  the  methods  and  viewpoints  presented  may 
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2.  ROBUST  METHODOLOGY:  REQUIREMENTS  AND  POSSIBILITIES 

Many  scientists  who  have  closely  examined  real  data 
have  encountered  occasional,  or  even  frequent,  anomalous  behavior. 
Apparent  anomalies  in  data  may  be  with  respect  to  either 

(a)  preconceptions  as  to  "proper"  data  behavior,  these  perhaps 
being  buttressed  by  (physical)  theory,  or 

(b)  the  nature  of  the  general  pattern  of  the  data,  especially 
those  data  points  in  the  immediate  neighborhood,  e.g., 

in  time  or  space. 

2.1.  Plots 

In  simple  circumstances  graphical  plots  will  quickly  reveal 
those  points  that  are  blatant  anomalies.  For  instance  suppose  that 
one  wishes  to  investigate  data  concerning  the  relationship  between 
wind  velocity  and  whitecap  cover  in  the  ocean.  Theory  may  suggest 
a  specific  relationship,  e.g.  that  white-cap  cover,  C,  be  nearly 
a  cubic  function  of  wind  velocity  v,  so  that  it  will  be  tempting 
to  plot  C  vs  v  and  note  an  appearance  as  shown  on  Fig.  1;  there 
solid  black  dots  represent  (simulated)  raw  data.  Since  the  eye 

2 

finds  it  difficult  to  distinguish  curves  of  the  form  C  =  a v  , 

3  7/2 

C  =  av  ,  C  =  av  ,  etc.,  *roir  one  another,  and  yet  is  sensitive  to 

3 

departures  from  linearity,  a  graph  of  C  vs  v  suggests  itself,  but 
is  not  included  here.  A  plot  on  log-log  paper  may  be  still  better. 


2 


As  presented,  the  data  conforms  in  general  to  the  theorized  relation¬ 
ship  or  scaling,  with  the  obvious  exception  of  the  circled  point  to 
the  right.  Such  an  anomalous  point,  or  points,  represents  a  challenge 
both  to  statistical  technology  and  to  the  ultimate  user  of  the  data. 

Statistical  technology  assumes  the  responsibility  for  revealing 
the  presence  of  such  points,  and,  if  possible,  for  providing  a 
meaningful  and  useful  summary  of  the  remaining  points.  It  falls 
to  the  consumer  or  ultimate  user  of  the  data,  preferably  with 
the  help  of  a  subject-matter  specialist  (physicist  or  oceanographer) 
to  interpret  the  apparently  anomalous  maverick — or  exotic,  or 
outlying — data  point:  is  it 

3 

(i)  an  evidence  of  the  failure  of  the  relation  C  =  av  ,  say 
for  large  velocities, 
or  is  it 

(ii)  an  outright  error  in  data  recording,  and  to  be  disregarded? 

being  just  two  possible  options. 

Note  that  simple  graphs  are  invaluable  for  pointing  out 
extreme  outliers  in  simple,  one  explanatory  variable,  situations. 

If  more  variables  are  required,  informative  plots  are  more  diffi¬ 
cult  without  the  use  of  more  statistical  technology.  We  next 
show  that  classical,  least-squares,  technology  may  be  quite  mis¬ 
leading,  but  that  replacements  are  available.  See  Mosteller  and 
Tukey  (1977) ,  abbreviated  MT  hereafter. 


2.2.  Fits  and  Residual  Plots 


Suppose  that  one  wishes  to  summarize  data  such  as  that 
in  Fig.  1  by  fitting  the  relationship  C  =  av3,  i.e.,  determining 
the  parameter  a  from  the  data.  The  classical  and  automatic  way 
of  doing  so  is  to  apply  least  squares;  computer  programs  are  uni¬ 
versally  available,  even  for  handheld  calculators  (the  TI  59,  or 
HP  67).  What  are  we  likely  to  find?  A  least-squares  line  (treat- 

3 

xng  w  =  v  as  the  independent  variable  presents  C  =  aw;  one 
can  also  plot  and  fit  C3//3  =  (aj^^v,  and  there  may  be  reasons 
for  this  choice)  is  quite  apt  to  fatally  misrepresent  the  situation, 
responding  much  too  sensitively  to  the  single  (he~e  encircled) 
outlying  value,  and  straying  systematically  away  from  the  main 
body  of  the  data;  see  the  points  represented  by  o  in  Fig.  1. 

An  alternative  method  for  fitting,  described  in  MT,  is 
less  susceptible  to  outlier  influence — is  far  more  robust  to 
departures  from  basic  assumptions — than  is  the  ordinary  least 
squares  (OLS)  method.  This  new  method',  termed  biweight  fitting, 
is  carried  out  by  a  procedure  that  uses  the  OLS  computation  itera¬ 
tively.  In  the  course  of  the  computations  weights  are  auto¬ 
matically  developed  that  reduce  the  influence  of  the  encircled 
value  of  Fig.  1,  permitting  the  fit  to  more  closely  approximate 
the  main  body  of  the  data.  We  now  describe  and  illustrate  the 
biweight  fitting  procedure  as  it  is  adapted  to  the  problem  of 
determining  the  parameter  a  in  the  relation  y^  vs  ax^. 
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Biweight  Fitting  Calculation 

(1)  Compute  the  kth  (k  =  1,2,3,...)  iterative  estimate  of  a, 

(k) 

denoted  by  a  by  solving 
n 


1  (y.  -  a<k»x1)xiwf-1»  =  0 

i-1  1  ill 


to  obtain 


£  y .  x .  w . 
(k)  i=l  1  1  1 


(k-1) 


n 


l  xfw . 
i=l  1  1 


2...  (k-1) 


(k-1 ) 

(2)  the  weights,  w^  ,  are  of  this  form: 


.(k-1) 


1  - 


y,  -  »(k-llx„« 


cS 


if  (*)  <  1 


=  0 


if  (•)  >  1 


where  (•)  refers  to  the  term  [  (y^a  (k-1)  x^)  /s  (k_1J  ] ; 

(k-1) 

S  is  a  scale  factor  (robust  replacement  for  the 

standard  deviation)  that  may  be  computed  in  the  following  manner, 


s  t 

(3)  The  k-1  iterated  value  of  the  scale  factor  is 


S(k  ^  =  mediant |y.  -  1^x. I), 


c  being  a  constant  of  value  6,  or  9 


(4)  the  first  value,  a  ^  ,  of  the  iterative  sequence  can  be 
obtained  by  equalizing  all  weights  =  1),  which  is 

equivalent  to  OLS;  alternatively,  one  can  utilize  a  "robust 
start,"  suggestions  for  which  can  be  found  in  MT. 

The  iteration  is  carried  on  until  the  difference  between  successive 
values  is  small;  usually  4  to  8  iterations  is  sufficient.  The 
resulting  a-estimate  can  be  denoted  by  a. 

Following  the  fitting  it  is  informative  to  plot  the 
residual  values: 


r^  =  y^  -  ax^  =  yi  -  ,  i  =  1,2,. ,.,n, 

/\ 

being  shorthand  for  the  predicted  y  value.  In  case  there 
is  a  single  outlier,  as  in  Fig.  1,  the  fitted  line  will  tend  to 
hug  the  major  point  cloud,  and  a  histogram  of  the  residuals  will 
dramatically  reveal  the  presence  of  the  outlier,  suggesting 
further  investigation.  A  plot  of  r^  vs  y^  is  also  useful.  See 
MT  for  further  suggestions. 

2.3.  Numerical  Illustration 

The  following  are  a  set  of  (simulated)  whitecap  percentages 
and  corresponding  wind  velocities.  Alongside  are  values  for 
white  cap  coverage  estimated  by  OLS  and  by  the  biweight  procedure. 


V -r 


i- 


Velocity 

Cover 

("Actual") 

Cover 

(OLS  Estimate) 

Cover 

(Robust  Estimate) 

2 

0.011 

0.011 

0.0067 

7 

0.63 

0.49 

0.29 

10 

1.30 

1.43 

0.84 

15 

3.89 

4.82 

2.83 

18 

2.89 

8.33 

4.88 

21 

8.16 

13.2 

7.76 

24 

,25^7 

19.7 

11.6 

It  is  clear  from  the  above  table,  and  perhaps  clearer  from  Fig.  1, 
that  the  OLS  solution,  in  its  attempt  to  fit  the  point  C(24)  =  25, 
systematically  and  considerably  over-estimates  the  points  at  v  =  15 
and  above.  The  biweight  estimator  performs  much  better,  allowing 
a  closer  fit  to  all  data  other  than  C(24) .  A  residual  plot  brings 
attention  to  bear  on  that  point. 

Since  the  values  of  "Actual  Cover"  were  actually  constructed 

3 

by  forming  0.008v  and  adding  Gaussian  random  noise  with  value  pro¬ 
portional  to  C (v) ,  and  since  the  sequence  of  values  of  0.008v3 
were  0.0064,  0.27,  0.80,  2.7,  4.67,  7.41,  11.06,  we  cannot  fault 
the  manner  in  which  the  biweight  procedure  functioned  in  this 
example  and  are  encouraoed  to  use  it  more  widely. 
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2.4.  Possible  Application  to  Remote  Sensing  Data 

In  a  paper  in  this  conference  proceedings  by  Depriest 
(1979),  and  in  Fleming  (1979),  a  problem  arising  from  partial 

i 

cloud  cover  contamination  of  remote  sensing  data  is  described 
and  addressed.  This  problem  has  the  following  origin.  A  series 
of  measurements  are  made  on  a  physical  quantity  (sea  surface 
temperatures)  but  are  contaminated.  That  is,  in  the  case  of 
sea  surface  temperatures,  if  no  clouds  are  present  the  measure¬ 
ments  are  approximately  normally  distributed  around  y  (the  true 
temperature) .  However,  if  clouds  are  present  a  fraction  of  the 
measurements  are  made  artificially  smaller,  cloud  temperatures 
being  lower  than  those  at  earth  surface.  The  problem  is  to  esti¬ 
mate  y.  Techniques  for  doing  so  are  described  by  Depriest  (1979) 
and  by  Fleming  (1979) .  We  describe  a  possible  alternative  approach 
that  uses  robust  regression.  Operational  characteristics  of 
the  two  procedures  have  not  yet  been  compared. 

(1)  Arrange  the  measurements  in  order:  y^  <  y^  <  y^  <  • • •  < 
yn-i  <  v  The  largest  observations  may  well  appear 
similar  to  the  largest  order  statistics  of  a  normal  distri¬ 
bution  with  (unknown)  mean  y  and  standard  deviation  a 
(sometimes  assumed  known,  although  caution  is  in  order) , 
while  the  smaller  ones  are  likely  to  depart  systematically. 

(2)  Carry  out  a  preliminary  plot  of 

Vk  ^  '  k  =  n,  n-1,  n-2,  ... 

8 


By®*?  *8^ 


\ 


where  *  * (p)  is  the  inverse  function  of  the  unit  normal: 
recall  that  if 

*(y)  =  /  exp(-|  z2)  —■?—  , 

*  /2tt 

is  the  unit  normal  distribution,  then  the  solution  of  the 
equation  $ (y)  =  p  gives 

y  (p)  =  *  1  (p) ; 


♦  ,  and  hence  $  ^ ,  are  widely  tabulated.  Alternatively, 
use  Arithmetic  Probability  Paper.  If  y^  is  an  ordered 
observation  from  a  normal  population,  then  the  plot  should 
appear  straight,  while  a  systematic  departure  from  linearity 
indicates  a  departure  from  normality.  Suppose  departures 
begin  to  occur  at  k  =  D;  sometimes  D  may  be  greater  than 
n/2.  One  may  first  eye-fit  a  straight  line  to  the  points 
k  =  n,  n-1,  ...  ,  D.  Then  yn/2  =  ^  (estimated  temperature), 
i.e.  the  value  of  the  fitted  line  at  n/2  should  give  a 
reasonable  value  for  y. 

(3)  Going  further  in  a  formal  direction,  one  may  wish  to  fit  a 
line  to  the  data  points.  Here  a  biweight  fit  should  behave 
well,  tending  to  be  oblivious  to  spurious  (cloud  contamina¬ 
tion)  points.  One  can  proceed  to  fit  the  relation 


with 


yk  vs  u  +  oxk 


x  =  ^  ) 

xk  ln+r 


k  =  n,  n-1 ,  n-2 ,  . . .  ; 
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a  start  using  the  eye-fit  to  points  k  =  n,  n-1,  ...  , 
may  be  worthwhile.  Finally,  quote  the  estimate 


D 


D  -  med  £  (ya/2  +  y(n/2)+i>. 

*  y(n+l)/2' 

where 


n  even ; 

n  odd  . 


yk  -  u  +  oxk  . 

The  above  procedure  seems  worth  further  investigation  and  refine¬ 
ment.  One  important  step  may  be  to  adjust  for  the  effect  of 
correlation  between  order  statistics  when  carrying  out  the 
regression. 


3.  SMOOTHING  DATA 

If  one  plots  certain  environmental  data,  e.g.  monthly 
total  rainfall,  or  perhaps  daily  maximum  temperature,  at  a 
particular  location,  systematic  regularities  seem  to  appear, 
but  may  be  masked  by  noise.  Often  there  is  a  seasonal  pattern, 
i.e.  one  that  is  roughly  cyclic  in  nature.  Attempts  to  fit  such 
a  pattern  with  polynomials  is  doomed  to  failure,  and  selection 
of  a  set  of  sines  and  cosines  that  does  well  (Fourier  series) 
may  lead  to  many  terms.  Some  method  of  smoothing  the  original 
series  that  lays  bare  the  regularities  is  to  be  desired.  After 
such  is  made  available,  one  can  study  the  residuals  around  it. 
Spectral  analysis  or  some  such  formal  procedure  may  then  be  of 
use. 

Classical  smoothing  procedures  involve  some  form  of  moving 
average,  and  are  susceptible  to  the  python-swallowing- the  pig 
difficulty:  imagine  using  the  linear  smoothing  operation 


on  the  yfc  series 


t 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

yt 

11 

9 

7 

8 

8 

29 

10 

8 

6 

9 

Syt 

© 

9 

8 

7.7 

15 

15^7 

15.7 

8 

7.7 

© 

Ryt 

© 

9 

8 

8 

8 

10 

10 

8 

8 

© 
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clearly  the  entry  of  29  at  t  =  5  into  the  smoothed  value 
series  Syfc  gives  a  serious  distortion,  travelling  as  it  does 
in  partially  digested  form  through  the  next  two  terms  of  the 
smoothed  series.  If  further  smoothing  is  attempted  the  bulge 
is  reduced  slightly,  but  spreads  out  in  time. 

On  the  other  hand,  the  non-linear  operation  of  taking 
running  medians,  as  suggested  by  Tukey  (1977)#  performs  effec¬ 
tively  (in  both  cases  circled  and  values  are  copied  from  the 
original  series;  more  sophisticated  procedures  can  be  invented 
as  well) .  The  last  row  in  the  table,  labelled  Ryfc,  gives  the 
result  of  this  robust  smoothing;  note  that  it  behaves  in  an 
intuitively  appealing  manner,  essentially  ignoring  the  outlying 
value  29.  Further  steps  can  be  taken  to  improve  the  "smooths," 
but  we  refer  to  Tukey  (19  77  )  Chapters  7  and  16  for  details. 

Two  further  points  may  be  made.  The  first  is  that  the 
analysis  of  a  sequence  of  data  points,  and  their  projection  or 
forecasting  in  space  or  time,  should  not  end  with  providing  a 
smoothed  or  averaged  version.  Examination  of  the  remaining  vari¬ 
ation,  e.g.  the  sequence  y  -  Ryfc,  called  the  "rough"  by  Tukey 
may  well  be  rewarding;  presence  and  suggestiveness  of  various 
outliers  is  much  more  evident  in  the  rough  (residuals)  sequence 
than  in  the  original  sequence  of  data  points.  Secondly,  the 
procedure  described  for  smoothing  simple  sequences  of  data  points 
must  be  adapted  to  planar  (two-dimensional)  data;  some  work  has 
been  done,  but  much  remains. 


4.  STOCHASTIC  MODELING  OF  ICE  PRESSURE  RIDGES 

The  dynamics  of  ice  formation  in  the  Earth's  cold  regions 
results  in  the  development  of  irregular  ice  pile-ups,  or  pressure 
ridges.  These  ridges  occur  in  an  apparently  random  fashion  in 
space;  in  fact  the  following  regularities  are  observed  by  remote 
sensing  methods  (courtesy  of  Dr.  W.  Weeks,  in  a  seminar  at  the 
Naval  Postgraduate  School,  Monterey,  California,  Winter,  1979; 
see  also  Weeks,  et  al.  (1979)): 

1)  Along  a  sampling  line  (e.g.  airplane  flight  path,  or  straight 
submarine  track)  ice  ridges  seem  to  appear  in  accordance 
with  a  stationary  Poisson  process,  so  if  R(x)  is  the  number 
of  such  ridges  encountered  over  a  distance  x,  then  approxi¬ 
mately 

P{R(x)  =  n}  =  e-Ax  -(A£i-  ,  n  =  0,1,2,... 

n  • 

where  \  >  0  is  the  density  of  ice  ridges. 

2)  The  probability  distribution  of  ridge  "sail  heights"  (or 

"keel  depths")  may  be  approximated  by  the  forms  F(y)  =  l-e-yy 
-vy2 

or  1  -  e  1  ;  the  best-fitting  distribution  may  well  depend 
upon  the  method  of  observation  (averaging  properties) . 

For  further  details  see  work  referenced  in  Weeks  et  al.  (1979)  . 

Now  it  may  be  of  interest  to  compute  the  distribution 
of  the  maximum  sail  height,  or  keel  depth, that  one  is  to  encounter 
over  a  course  of  length  x.  This  is  very  simple,  given  the 
particular  distributions  of. sail  number  and  size  and  furthermore 


assuming  independent  between  ridge  heights.  Let  H(x)  be  the 
maximum  sail  height;  then 

oo 

P{H  (x)  <  y}  =  l  e-Xx  [F(y)]n 

n=0  n* 

since  all  of  the  Poisson-distributed  heights  must  be  less  than  y 
in  order  for  the  maximum  to  be  below  y. 

Sum  out  to  obtain 

P(H(x)  £  y}  =  exp{-Xx[l-F(y)  ]> 

Depending  upon  which  distribution  is  picked  for  ridge  heights, 
we  get 

a)  P(H(x)  £  y)  =  exp(-Axe-yy) 

2 

b)  P(H(x)  £  y}  =  exp(-Axe”vy  ) 

These  closely  resemble  classical  extreme  value  distributions. 

Note  that  if  logs  are  taken  simplicity  occurs: 

a')  in  P(H(x)  £  y)  =  -Axe  yy; 

£n(-in  P{H(x)  £  y } )  =  £n(Ax)  -  uy 

2 

b')  in  P{H(x)  £  y}  ■  -Axe”vy  ; 

£n(-£n  p{H(x)  £  y})  ■  £n(Ax)  -  vy^  . 
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If  either  of  these  formulas  are  to  be  used  for  practical  purposes, 
values  of  the  parameters  must  be  obtained.  In  order  to  estimate 
parameters  X,  u,  \>  in  the  above  models  from  data  one  naturally 
thinks  of  the  method  of  maximum  likelihood.  Suppose  that  we 
have  observed  R(x)  =  n  ridges  of  heights  y^,  y2*  •••  >  Yn* 
Then  the  maximum  likelihood  estimates  are 


*  “  n  ' 


_1_ 

~2 


where  as  usual  we  have  put 


7  _  1  f 
n  iii  Yi 


Hence  our  estimates  are  of  the  form 

^  A  A 

a")  est  Uni-in  P{H(x)  £  y} )  =  l n  A  +  in  x  -  py 
b")  est  i,n(-£n  P{H(x)  £  y} )  =  in  X  +  &n  x  -  vy2 

If  rather  large  samples  are  available  and  if  distributional  assump¬ 
tions  are  well  satisfied  one  may  feel  comfortable  with  conventional 
standard  errors  based  on  Fisher  information  and  normality;  see 
Crame'r  (1946).  On  the  other  hand,  it  is  of  interest  to  apply 
the  jackknife  technique  (see  R.  G.  Miller  (1974)  for  a  review)  to 
obtain  estimates  of  the  variance  of  estimate  due  particularly  to 

the  ridge  heights.  To  carry  out  the  calculation,  (i)  compute 
A  — 2 

vall  =  n/y  ;  then  (ii)  compute 


A 

V 


<-j) 


“2 - T~ 

*1  +  y2  + 


n-1 

T 


>+  yj-l  + 


0  +  y 


j+1 


+  *••+  Yr 


I 


for  j  =  1,2,..., 


n;  then  (iii)  compute  the  pseudovalues 


v.  »  nvftll  -  (n-1)  v^  .),  and  (iv) 
point  estimate  vJK  =  (1/n)  J\=1 


average  to  obtain  a  jackknifed 
and  its  variance 


Then  we  can  estimate  the  standard  error  of  the  probability  prediction, 
e.g.  b")  by  computing 


A  similar  calculation  is  easily  performed  for  model  a) ;  details 
are  omitted.  From  the  above  results,  approximate  confidence  inter¬ 
vals  may  be  constructed  for  the  probability  of  encountering  a 
(maximum)  ridge  sail  height  less  than  y  in  magnitude. 

Fairly  recent  theoretical  results  of  Efron  and  Hinkley 
(1978)  suggest  that  if  a  traditional  maximum  likelihood  approach 
is  taken,  one  is  better  off  using  observed  Fisher  information 
rather  than  expected  Fisher  information  in  order  to  establish  an 


approximate  standard  error  in  either  case  a)  or  b) .  However, 
work  of  Reeds  (1978)  suggests  that  use  of  the  jackknife  in 


conjunction  with  maximum  likelihood  yields  results  that  tend 


16 


to  be  rather  independent  of  the  basic  model  chosen.  Both  of 
these  suggestions  must  be  validated  by  further  work,  a  good 
deal  of  which  will  necessarily  involve  Monte  Carlo  simulation. 
Such  work  should  be  of  great  importance  and  interest  to  those 
who  must  assess  the  probabilities  of  extreme,  rare,  events,  and 
who  furthermore  wish  to  provide  some  reasonably  valid  estimates 
of  the  error  of  their  estimates. 

Acknowledgment.  The  writer  is  much  indebted  to  LCDR  C.  F. 
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